Stabilization of part-3 of shared-memory support, based on review comments. #616

gapisback · 2024-01-31T05:17:46Z

This PR is to test out CI-jobs while applying review comments received for change-set under PR #569.

This PR will not be checked-in, but will be used to refresh that other PR's change-set.

…ject mgmt The main change with this commit is the support for free-fragment lists and recycling of small fragments from shared memory. This was a main limitation of the support added in previous commits. Another driving factor for implementing some free-list support was that previous multi-user concurrent insert performance benchmarking was not functional beyond a point, and we'd frequently run into shmem Out-Of-Memory (OOMs), even with shmem sizes > 8 GiB (which worked in a prior dev/perf-test cycle). The main design changes to manage small-fragments are follows: Managing memory allocation / free using platform_memfrag{} fragments: - Allocation and free of memory is dealt with in terms of "memory fragments", a small structure that holds the memory->{addr, size}. All memory requests (as is being done previously) are aligned to the cacheline. - Allocation: All clients of memory allocation have to "hand-in" an opaque platform_memfrag{} handle, which will be returned populated with the memory address, and more importantly, the size-of-the-fragment that was used to satisfy the memory request. - Free: Clients now have to safely keep a handle to this returned platform_memfrag{}, and hand it back to the free() method. free() will rely "totally" on the size specified in this input fragment handle supplied. And the free'd memory fragment will be returned to the corresponding free-list bucket. - Upon free(), the freed-fragment is tracked in a few free-lists bucketed by size of the freed-fragment. For now, we support 4 buckets, size <= 64, <= 128, <= 256 & <= 512 bytes. (These sizes are sufficient for current benchmarking requirements.) A free'd fragment is hung off of the corresponding list, threading the free-fragments using the fragment's memory itself. New struct free_frag_hdr{} provides the threading structure. It tracks the current fragment's size and free_frag_next pointer. The 'size' provided to the free() call is is recorded as the free'd fragment's size. - Subsequently, a new alloc() request is 1st satisfied by searching the free-list corresponding to the memory request. For example, a request from a client for 150 bytes will be rounded-up a cacheline boundary, i.e. 192 bytes. The free-list for bucket 256 bytes will be searched to find the 1st free-fragment of the right size. If no free fragment is found in the target list, we then allocate a new fragment. The returned fragment will have a size of 256 (for an original request of 150 bytes). - An immediate consequence of this approach is that there is a small, but significant, change in the allocation, free APIs; i.e. TYPED_MALLOC(), TYPED_ARRAY_MALLOC() and TYPED_FLEXIBLE_STRUCT_MALLOC(), and their 'Z' equivalents, which return 0'ed out memory. - All existing clients of the various TYPED*() memory allocation calls have been updated to declare an on-stack platform_memfrag{} handle, which is passed back to platform_free(). - In some places memory is allocated to initialize sub-systems and then torn down during deinit(). In a few places existing structures are extended to track an additional 'size' field. The size of the memory fragment allocated during init() is recorded here, and then used to invoke platform_free() as part of the deinit() method. An example is clockcache_init() where this kind of work to record the 'size' of the fragment is done and passed-down to clockcache_deinit(), where the memory fragment is then freed with the right 'size'. This pattern is now to be seen in many such init()/deinit() methods of diff sub-systems; e.g. pcq_alloc(), pcq_free(), ... - Cautionary Note: If the 'ptr' handed to platform_free() is not of type platform_memfrag{} *, it is treated as a generic <struct> *, and its sizeof() will be used as the 'size' of the fragment to free. This works in most cases. Except for some lapsed cases where, when allocating a structure, the allocator ended up selecting a "larger" fragment that just happened to be available in the free-list. The consequence is that we might end-up free'ing a larger fragment to a smaller-sized free-list. Or, even if we do free it to the right-sized bucket, we still end-up marking the free-fragment's size as smaller that what it really is. Over time, this may add up to a small memory leak, but hasn't been found to be crippling in current runs. (There is definitely no issue here with over-writing memory due to incorrect sizes.) - Copious debug and platform asserts have been added in shmem alloc/free methods to cross-check to some extent illegal calls. Fingerprint Object Management: Managing memory for fingerprint arrays was particularly problematic. This was the case even in a previous commit, before the introduction of the memfrag{} approach. Managing fingerprint memory was found to be especially cantankerous due to the way filter-building and compaction tasks are queued and asynchronously processed by some other thread / process. The requirements from the new interfaces are handled as follows: - Added a new fingerprint{} object, struct fp_hdr{}, which embeds at its head a platform_memfrag{}. And few other short fields are added for tracking fingerprint memory mgmt gyrations. - Various accessor methods are added to manage memory for fingerprint arrays through this object. E.g., - fingerprint_init() - Will allocate required fingerprint for 'ntuples'. - fingerprint_deinit() - Will dismantle object and free the memory - fingerprint_start() - Returns start of fingerprint array's memory - fingerprint_nth() - Returns n'th element of fingerprint Packaging the handling of fingerprint array through this object and its interfaces helped greatly to stabilize the memory histrionics. - When SplinterDB is closed, shared memory dismantling routine will tag any large-fragments that are still found "in-use". This is percolated all the way back to splinterdb_close(), unmount() and to platform_heap_destory() as a failure $rc. Test will fail if they have left some un-freed large fragments. (Similar approach was considered to book-keep all small fragments used/freed, but due to some rounding errors, it cannot be a reliable check at this time. So hasn't been done.) Test changes: Miscellaneous: - Elaborate and illustrative tracing added to track memory mgmt done for fingerprint arrays, especially when they are bounced around queued / re-queued tasks. (Was a very problematic debugging issue.) - Extended tests to exercise core memory allocation / free APIs, and to exercise fingerprint object mgmt, and writable_buffer interfaces: - platform_apis_test: - splinter_shmem_test.c: Adds specific test-cases to verify that free-list mgmt is happening correctly. - Enhanced various diagnostics, asserts, tracing - Improved memory usage stats gathering and reporting - Added hooks to cross-check multiple-frees of fragments, and testing hooks to verify if a free'd fragment is relocated to the right free-list - Add diagram for large-free fragment tracking.

…c...

…ared_memory_tests()

netlify · 2024-01-31T05:17:50Z

✅ Deploy Preview for splinterdb canceled.

Name	Link
🔨 Latest commit	`35d9d64`
🔍 Latest deploy log	https://app.netlify.com/sites/splinterdb/deploys/65bad8bf05ac8c000893c8ab

Fix assertions in test_large_frag_platform_realloc_to_large_frag() Corrects test cases so unit_test should now run cleanly.

Following tests are passing on Docker on macOSX: - test.sh passes in both modes - bin/unit_test passes in release and debug build mode - INCLUDE_SLOW_TESTS=true ./test.sh run_fast_shared_memory_tests - BUILD_MODE=debug INCLUDE_SLOW_TESTS=true ./test.sh run_fast_shared_memory_tests

gapisback and others added 9 commits January 29, 2024 14:14

Fix test.sh shellcheck errors.

b08c2a3

Rework interface of platform_aligned_zalloc()

cce613c

Rework platform_shm_alloc() and platform_aligned_malloc() interface.

4040dcc

Rework TYPED_MANUAL_MALLOC() interface.

8cc9ae0

Rework interfaces of TYPED_MANUAL_ZALLOC(), TYPED_ARRAY_MALLOC()

ba8aa88

Rework calls to TYPED_ARRAY_ZALLOC_MF(), TYPED_ALIGNED_MALLOC_MF() et…

7346ba9

…c...

Use ZERO_STRUCT() in memfrag_set_empty(). Manage 'hid' field.

ce99520

Break out run_fast_shared_memory_tests(). Rename caller to run_all_sh…

ecb8cb3

…ared_memory_tests()

vmwclabot added the cla-not-required label Jan 31, 2024

gapisback added the ok-to-test label Jan 31, 2024

Aditya Gurajada added 3 commits January 31, 2024 11:25

Rewrork realloc interfaces and consumers. ./test.sh passes.

f872894

Fix memory usage related assertions in splinter_shmem_test.c

da2802d

Fix assertions in test_large_frag_platform_realloc_to_large_frag() Corrects test cases so unit_test should now run cleanly.

gapisback force-pushed the agurajada/shmem-free-list-mgmt-Rev7 branch from c44f129 to 35d9d64 Compare January 31, 2024 23:33

gapisback mentioned this pull request Feb 1, 2024

Support free-fragment recycling in shared-segment. Add fingerprint object management. #569

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilization of part-3 of shared-memory support, based on review comments. #616

Stabilization of part-3 of shared-memory support, based on review comments. #616

gapisback commented Jan 31, 2024

netlify bot commented Jan 31, 2024 •

edited

Loading

Stabilization of part-3 of shared-memory support, based on review comments. #616

Are you sure you want to change the base?

Stabilization of part-3 of shared-memory support, based on review comments. #616

Conversation

gapisback commented Jan 31, 2024

netlify bot commented Jan 31, 2024 • edited Loading

✅ Deploy Preview for splinterdb canceled.

netlify bot commented Jan 31, 2024 •

edited

Loading